refactor(data): Refactor `wordSearch` and other methods for clarity and maintainability #1436

tora-pan · 2023-02-28T20:00:14Z

Refactor `wordSearch` to use lookup object

Added a lookup object to more cleanly convert from カタカナ to ひらがな.
Removed nested if/for block and replaced with a more concise convertToHiragana method.
Added a isKana method to return if whether or not the current character is indeed 日本語.
Renamed a few variables for code readability.

Testing

Cherry picked tests from separate branch to cover newly added methods and to make sure there were no regressions.

…guidelines

…contrib guidelines" This reverts commit e509cd3.

This reverts commit 5b7d512.

tora-pan · 2023-02-28T20:04:09Z

@melink14 Don't mind the extra commits at the end. I accidentally did a git add . and pushed. (pushing my web-test-runner-config.js and some .vscode settings.

I did wan't to ask you about that. I have to update my executablePath to make sure pupetteer works correctly. Do you think I should just add it to my global .gitignore or something?

Let me know what you think about the contributions and please let me know if there is anything that you think could be done differently, refactored, etc...

ありがとうございます！

melink14

Based on the new screenshots that were pushed it looks like there are some unexpected changes to how words are looked up (less words are found); we'll need to debug that.

extension/data.ts

extension/test/data_test.ts

codecov · 2023-03-01T00:35:40Z

Codecov Report

Merging #1436 (448994a) into main (b6e87d9) will increase coverage by 2.37%.
The diff coverage is 99.12%.

@@            Coverage Diff             @@
##             main    #1436      +/-   ##
==========================================
+ Coverage   79.62%   82.00%   +2.37%     
==========================================
  Files           7        8       +1     
  Lines        3004     3189     +185     
  Branches      189      189              
==========================================
+ Hits         2392     2615     +223     
+ Misses        607      570      -37     
+ Partials        5        4       -1

Impacted Files	Coverage Δ
extension/data.ts	`85.94% <98.08%> (+3.61%)`	⬆️
extension/character_info.ts	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

melink14

Didn't look at the tests yet due to time constraints but mostly nits with some high level questions on if the new code is functionally equivilant to the old code.

Also, it seems there are still some unexpected screenshot diffs! (title translations aren't continuing past the first word?)

extension/data.ts

melink14

Another thought: I wonder if we should just use wanakana library to do all this stuff: https://www.npmjs.com/package/wanakana

Seems to have what we need and is small.

edit: maybe doesn't handle half width...

melink14 · 2023-03-01T23:43:43Z

I did wan't to ask you about that. I have to update my executablePath to make sure pupetteer works correctly. Do you think I should just add it to my global .gitignore or something?

I forgot to answer this question but this can probably be updated to fallback to an environment variable or command line option.

I might also test if it's even required anymore (since there should be some logic to find chrome in a normal way) but I think I was debugging some problems in WSL which made me add it originally.

tora-pan · 2023-03-02T04:40:56Z

So I went ahead and commented out the executablePath: portion and the tests still run so I think I've found our solution. 👍

tora-pan · 2023-03-03T06:05:38Z

@melink14 all tests are passing now. Fixed a few logic issues. Going to do a bit more cleaning up of variable names and typing in the morning tomorrow.

🥂

melink14 · 2023-03-04T02:38:54Z

I see the images are still there but that might be due to the presubmit not re-running to update them. (Since it doesn't re-run when github action pushes a commit)

I'll close and reopen the pull request to get it to re-run. (I should add a comment or label command for it...)

extension/data.ts

extension/types/hiraganaLookupMap.ts

extension/types/unicode-constants.ts

tora-pan · 2023-03-04T05:49:44Z

Another thought: I wonder if we should just use wanakana library to do all this stuff: https://www.npmjs.com/package/wanakana

Seems to have what we need and is small.

edit: maybe doesn't handle half width...

This hasn't been updated in years but looks like it does what we need it to do?
https://github.com/yomotsu/japanese-string-utils
Maybe we could either use it or get "hints" from it?

melink14 · 2023-03-08T14:01:22Z

This hasn't been updated in years but looks like it does what we need it to do?
yomotsu/japanese-string-utils
Maybe we could either use it or get "hints" from it?

Might be worth trying though if we found problems I guess we'd need to fork it.

melink14

I still haven't looked at the tests yet but I can feel the implementation is pretty close probably.

melink14 · 2023-03-08T14:08:24Z

extension/data.ts

+    for (let i = 0; i < str.length; i++) {
+      const char = str.charAt(i);
+      const nextChar = i + 1 <= str.length - 1 ? str.charAt(i + 1) : null;
+      const nextCharCode = nextChar?.charCodeAt(0);


It's more clear if we store the characters as unicode as discussed in the other thread.

extension/data.ts

melink14 · 2023-03-08T14:12:16Z

extension/data.ts

-    0x307c,
-  ];
-  cs: number[] = [0x3071, 0x3074, 0x3077, 0x307a, 0x307d];
+  convertToHiragana(str: string): string {


I might have said this elsewhere but definititely recommend a jsdoc for this method and maybe a tweak to the name since it also has a truncation function now (which we added back for consistency)

What do you mean by truncation portion? If you mean the stripping of the ZWJ and Tilde, etc... then that happens before it is sent to the convertToHiragana method.

Here is what I currently have using TS-Doc:

What do you think?

It's at line 352; I left a comment there for easy reference.

The TS doc is solid but I'll leave some pedantic comments (mostly conforming to best practices developed in my day job but that have served me well)

…iragana

melink14 · 2023-03-12T14:49:35Z

extension/data.ts

+        currentCharCode <= KANA.HW_KATAKANA_END;
+      let key = '';
+      if (currentCharCode < 0x3000) {
+        break;


This is the truncation I meant; with this line the string will be truncated at the first not kana/kanji character.

melink14 · 2023-03-12T14:52:11Z

extension/data.ts

-    0x307c,
-  ];
-  cs: number[] = [0x3071, 0x3074, 0x3077, 0x307a, 0x307d];
+  convertToHiragana(str: string): string {


It's at line 352; I left a comment there for easy reference.

The TS doc is solid but I'll leave some pedantic comments (mostly conforming to best practices developed in my day job but that have served me well)

melink14 · 2023-03-12T14:53:30Z

extension/character_info.ts

@@ -0,0 +1,186 @@
+export const kanaToHiraganaNormalizationMap: Record<string, string> = {


Should this be constant case? (and readonly I guess)

melink14 · 2023-03-12T14:56:45Z

extension/data.ts

+    );
+  }
+  /**
+   * Returns the input string converted into hiragana. If any characters are not


nit: It's good to reference the inputs expliictly in the summary statement so that you don't need to use an @param. The @ markers can be useful for additional information that doesn't flow naturally in the summary statement but there's no need to repeat information there that can be read once at the beginning!

So in this case something like:

Returns the given `kanaWord` with katakana and half width characters converted to full-width hiragana. ヴ is not converted. If a non-kana characters is found in `kanaWord`, that and the following characters are omitted from the returned conversion.

melink14 · 2023-03-12T14:59:12Z

extension/data.ts

+  }
+  /**
+   * Returns the input string converted into hiragana. If any characters are not
+   * found in the [NormalizationMap](./character_info.ts) then the character


This information is all true but it's more of an implementation detail vs a spec. Ideally, the doc string should be about the goal of the method and needn't go into detail about how we accomplished that. Basically, anything that could be changed without updating a test, shouldn't go in the doc string.

…ctor-word-search

tora-pan and others added 8 commits February 28, 2023 06:02

refactor(data): Refactor wordSearch to use lookup

7c94ddc

refactor(data): Remove missed console.logs

2ea4ed6

refactor(data): Cherry pick test from other branch to follow contrib …

e509cd3

…guidelines

refactor(data): Remove commented out code

5b7d512

Revert "refactor(data): Cherry pick test from other branch to follow …

b0770b5

…contrib guidelines" This reverts commit e509cd3.

Revert "refactor(data): Remove commented out code"

1e093f1

This reverts commit 5b7d512.

refactor(data): Remove accidently added file

fcea7eb

test(visual): Update baselines with new screenshots

968c518

melink14 requested changes Mar 1, 2023

View reviewed changes

extension/data.ts Outdated Show resolved Hide resolved

extension/test/data_test.ts Outdated Show resolved Hide resolved

extension/test/data_test.ts Show resolved Hide resolved

Merge branch 'main' into refactor(data)_refactor-word-search

ddb8635

tora-pan and others added 4 commits March 1, 2023 05:23

refactor(data): Fix regression bug and clean up test styling

ba9349a

refactor(data): Fix logic causing regression and test styling

8415bfa

refactor(data): Replace null type to wordSearch

912fd4a

test(visual): Update baselines with new screenshots

2f5637b

melink14 requested changes Mar 1, 2023

View reviewed changes

melink14 reviewed Mar 1, 2023

View reviewed changes

tora-pan and others added 3 commits March 2, 2023 21:10

refactor(data): Comment out exec path for web-test-runner

35a9eb6

refactor(data): Fix logic causing regressions

8b5a4ce

Merge branch 'main' into refactor(data)_refactor-word-search

bea4823

github-actions bot and others added 6 commits March 3, 2023 06:07

style: Fix lint/formatting errors

5a7a430

refactor(data): Fix merge conflicts from different stashes

5994c34

refactor(data): Fix stashing mess

4cf1bee

test(visual): Update baselines with new screenshots

f9c922c

refactor(data): Move normalizationMap to own file

a457d52

style: Fix lint/formatting errors

912fc7a

melink14 requested changes Mar 4, 2023

View reviewed changes

melink14 closed this Mar 4, 2023

melink14 reopened this Mar 4, 2023

melink14 changed the title ~~Refactor(data) refactor word search~~ refactor(data): Refactor wordSearch and other methods for clarity and maintainability Mar 4, 2023

refactor(data): Revert deinflect method and types

a9815b7

tora-pan and others added 5 commits March 4, 2023 05:30

refactor(data): Add non kana break to hiragana conversion

5e6f6d2

refactor(data): Fix selection regression bug, remove console log

9e9af02

test(visual): Update baselines with new screenshots

4b4ca61

refactor(data): Revert execPath for web-test-runner-config

61878fe

test(visual): Update baselines with new screenshots

ccc8bcd

melink14 requested changes Mar 8, 2023

View reviewed changes

refactor(data): Add explicit check for halfWidthKatakana to converToH…

92431c4

…iragana

melink14 requested changes Mar 12, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/main' into refactor(data)_refa…

448994a

…ctor-word-search

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(data): Refactor `wordSearch` and other methods for clarity and maintainability #1436

refactor(data): Refactor `wordSearch` and other methods for clarity and maintainability #1436

tora-pan commented Feb 28, 2023

tora-pan commented Feb 28, 2023

melink14 left a comment

codecov bot commented Mar 1, 2023 •

edited

Loading

melink14 left a comment

melink14 left a comment •

edited

Loading

melink14 commented Mar 1, 2023

tora-pan commented Mar 2, 2023

tora-pan commented Mar 3, 2023

melink14 commented Mar 4, 2023

tora-pan commented Mar 4, 2023

melink14 commented Mar 8, 2023

melink14 left a comment

melink14 Mar 8, 2023

melink14 Mar 8, 2023

tora-pan Mar 8, 2023

melink14 Mar 12, 2023

melink14 Mar 12, 2023

melink14 Mar 12, 2023

melink14 Mar 12, 2023

melink14 Mar 12, 2023

melink14 Mar 12, 2023

		@@ -0,0 +1,186 @@
		export const kanaToHiraganaNormalizationMap: Record<string, string> = {

refactor(data): Refactor wordSearch and other methods for clarity and maintainability #1436

Are you sure you want to change the base?

refactor(data): Refactor wordSearch and other methods for clarity and maintainability #1436

Conversation

tora-pan commented Feb 28, 2023

Refactor wordSearch to use lookup object

Testing

tora-pan commented Feb 28, 2023

melink14 left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 1, 2023 • edited Loading

Codecov Report

melink14 left a comment

Choose a reason for hiding this comment

melink14 left a comment • edited Loading

Choose a reason for hiding this comment

melink14 commented Mar 1, 2023

tora-pan commented Mar 2, 2023

tora-pan commented Mar 3, 2023

melink14 commented Mar 4, 2023

tora-pan commented Mar 4, 2023

melink14 commented Mar 8, 2023

melink14 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

refactor(data): Refactor `wordSearch` and other methods for clarity and maintainability #1436

refactor(data): Refactor `wordSearch` and other methods for clarity and maintainability #1436

Refactor `wordSearch` to use lookup object

codecov bot commented Mar 1, 2023 •

edited

Loading

melink14 left a comment •

edited

Loading